File Requirements
Most documents created on a Mac OS-based system use a richer text model than pure Unicode, so the emphasis here is on easy interchange with other platforms. In particular, an application should be able to
- import and export Unicode plain text files from other platforms with no data loss
- easily import a Unicode plain text file into a rich text environment
File Types
The file type'utxt'
has been registered for UTF-16 plain text documents. The (optional) scrap type'utxt'
is also registered for UTF-16 Clipboard text.Whether it is useful to register a file type or scrap type for UTF-8 text is currently under discussion. As do other documents and text that use WorldScript encodings, plain UTF-8 documents could use the file and scrap type
'TEXT'
. UTF-8 is compatible with the assumptions that govern WorldScript encodings; these encodings are not specifically identified in'TEXT'
files and Clipboard contents.File Content
A plain text Unicode document, in a file or on the Clipboard, can contain any valid character from Unicode 2.0 or later. In particular, it can contain control characters in the rangeU+0000
throughU+001F
andU+0080
throughU+009F
. It may also contain codes in the Corporate and Private Use Zones although these may not interchange properly.The byte-order mark
U+FEFF
may be present at the beginning of the content. If it is absent in UTF-16 content, big-endian order is assumed.Creating Content
When creating file content, write line and paragraph separators using the special Unicode characters intended for this purpose--U+2028
andU+2029
--instead of using some combination ofCR
andLF
. This makes the content more portable; when the content is read on a particular platform, these Unicode separators can be converted to the separators customary for that platform.Reading Content
When reading file content, accept and treat the Unicode line and paragraph separators as such. In addition, also treat any of the following as paragraph separators:LF
,CR
,CRLF
.When converting content to Mac OS encodings, set the
kUnicodeLooseMappingsBit
control flag. (You may use other control bits in addition to this one).
Subtopics
- E - File Types
- E - File Content
- E - Creating Content
- E - Reading Content